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Abstract 

Recently, a new decoding rule called jar decoding was proposed, under which the decoder first 
forms a set of suitable size, called a jar, consisting of sequences from the channel input alphabet 
considered to be closely related to the received channel output sequence through the channel, and then 
takes any codeword from the jar as the estimate of the transmitted codeword; under jar decoding, a non- 
asymptotic achievable tradeoff between the coding rate and word error probabihty was also established 
for any discrete input memory less channel with discrete or continuous output (DIMC). Along the path 
of non-asymptotic analysis, in this paper, it is further shown that jar decoding is actually optimal up 
to the second order coding performance by establishing new non-asymptotic converse coding theorems, 
and determining the (best) coding performance of finite block length for any block length n and word 
error probability e up to the second order. Specifically, a new converse proof technique dubbed the 
outer mirror image of jar is first presented and used to establish new non-asymptotic converse coding 
theorems for any encoding and decoding scheme. To determine the coding performance of finite block 
length for any block length n and error probabihty e, a quantity (5i „(e) is then defined to measure the 
relative magnitude of the error probability e and block length n with respect to a given channel and 
an input distribution t. By combining the achievability of jar decoding and the new converses, it is 
demonstrated that when e < 1/2, the best channel coding rate i?„(e) given n and e has a "Taylor-type 
expansion" with respect to (5t,„(e), where the first two terms of the expansion are ma.xt[I{t; P)^Si,n{f)], 
which is equal to I{t*,P) — (5t*^„(e) for some optimal distribution t*, and the third order term of 
the expansion is 0{6^, „(e)) whenever 5t._„(e) = fl{^y\nn/n), thus implying the optimality of jar 
decoding up to the second order coding performance. Finally, based on the Taylor-type expansion and 
the new converses, two approximation formulas for i?„(e) (dubbed "SO" and "NEP") are provided; 
they are further evaluated and compared against some of the best bounds known so far, as well as the 
normal approximation of i?n(e) revisited recently in the literature. It turns out that while the normal 
approximation is all over the map, i.e. sometime below achievable bounds and sometime above converse 
bounds, the SO approximation is much more reliable as it is always below converses; in the meantime, 
the NEP approximation is the best among the three and always provides an accurate estimation for 
i?„(e). An important implication arising from the Taylor-type expansion of i?„(e) is that in the practical 
non-asymptotic regime, the optimal marginal codeword symbol distribution is not necessarily a capacity 
achieving distribution. 

Index Terms 

Channel capacity, channel coding, jar decoding, non-asymptotic coding theorems, non-asymptotic 
equipartition properties, non-asymptotic information theory, Taylor-type expansion. 
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I. Introduction 

Recently, a new decoding rule called jar decoding was proposed in [[T|, [|2|, under which 
the decoder first forms a set of suitable size, called a jar, consisting of sequences from the 
channel input alphabet considered to be closely related to the received channel output sequence 
through the channel, and then takes any codeword from the jar as the estimate of the transmitted 
codeword. It was shown in [|T| and Q that under jar decoding, for any binary input memoryless 
channel with discrete or continuous output and with uniform capacity achieving distribution 
(BIMC), linear codes C„ of block length n with rate R{Cn) and word error probability Pe{Cn) 
exist such that 



and 

2{l-CBE)Mii(X\Y,\) 

RK) > C^iMC - S - rxwiS) + ^'^"^'"'^''^ - (1.2) 

n 

for any 6 E (0, A*(X|F)), where Crimc is the capacity of the given BIMC, A = r'^^yi^)^ ^^'^ 
all other quantities are defined later in Sections |ll] and IV Similar achievable results were also 



established in yj for non-linear codes for any discrete input memoryless channel with discrete 
or continuous output (DIMC). 

The achievability given in ( |1.1| ) and ( |1.2| ) is quite sharp. It implies [[T|, [|2| that for any BIMC, 



there exist linear codes C„ of block length n such that 



> Cbimc - MX\Y)d {^+^] Oi (1.3) 



n V 2 J n \ n 

while maintaining the word error probability 

PeiCn) < +0( = e ( ^ ) (1.4) 



2\/'iTaln.n \ J \\/hin 

and 

R{Cn) > C^mc -^-^ + (1-5) 

while maintaining the word error probability 

P(r^<n ( " ^ ^ M^{X\Y) i 

\an[X\Y)J (T^{X\Y) 
where cr^(X|y) and My{{X\Y) are parameters related to the channel and specified in Section [ll| 



oo 



Q{z) = / e-'"'^dt, (1.7) 



/27r 
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and Cbe < 1 is the universal constant in the Berry-Esseen central limit theorem. Furthermore, 



when the error probability is maintained constant in ( |1.6[ ), the first two terms (i.e., Cbimc and 
in ( |1.5[ ) coincide with the asymptotic second order coding rate analysis in [j3|, [j4|, [j5|. 
Consequently, jar decoding is shown to be second order optimal asymptotically when the error 
probability e is maintained constant with respect to block length n. 

In the non- asymptotic regime, however, the concept of constant error probability with respect 
to block length n is not applicable. For example, suppose that n = 1000 and the error probability 
e is equal to 10^^. How would one interpret the relationship between e an in this case? Does 
it make sense to interpret e as a constant with respect n? Or is it better to interpret e as a 
polynomial function of n, namely, e = Since e is pretty small relatively to n, we believe 
that the latter interpretation makes a lot of sense in this particular case. In general, when both the 
error probability e and block length n are finite, what really matters is their relative magnitude 
to each other. Therefore, it is interesting to see if the achievability in ( |1.1| ) and ( |1.2[ ) remains 
tight up to the second order in the non- asymptotic regime where both the error probability e and 
block length n are finite. 

In this paper, we provide an affirmative answer to the above question. Specifically, we first 
present a new converse proof technique dubbed the outer mirror image of jar and use the 
technique to establish new non- asymptotic converse coding theorems for any binary input mem- 
oryless symmetric channel with discrete or continuous output (BIMSC) and any DIMC. We 
then introduce a quantity 5t,n(e) to measure the relative magnitude of the error probability e and 
block length n with respect to a given channel and an input distribution t. By combining the 
achievability of jar decoding (see ( |1.1[ ) and ( |1.2[ ) in the case of BIMSC) with the new converses, 
we further show that when e < 1/2, the best channel coding rate -R„(e) given n and e has a 
"Taylor-type expansion" with respect to 5f,„(e) in a neighborhood of 5t,n(e) = 0, where the first 
two terms of the expansion are raaxt[I{t; P) — (5f.„(e)], which is equal to I{t*, P) — „(e) for 
some optimal distribution t*, and the third order term of the expansion is 0(5^, „(e)) whenever 
^t*,n(e) = ^{^y\nn/n). Since the leading two terms in the achievability of jar decoding (see ( |1.2| ) 
in the case of BIMSC when Pe{Cn) = e) coincide with the first two terms of this Taylor-type 
expansion of -R„(e), jar decoding is indeed optimal up to the second order coding performance 
in the non-asymptotical regime. 

Finally, based on the Taylor-type expansion of -R„(e) and our new non-asymptotic converses. 
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we also derive two approximation formulas (dubbed "SO" and "NEP") for R„{e) in the non- 
asymptotic regime. The SO approximation formula consists only of the first two terms in the 
Taylor-type expansion of -R„(e). On the other hand, in addition to the first two terms in the Taylor- 
type expansion of -R„(e), the NEP approximation formula includes some higher order terms 
from our non-asymptotic converses as well. (Here, NEP stands for non-asymptotic equipartition 
properties established recently in Q, and underlies both the achievability bounds in ( |1.1[ ) and 



( |1.2[ ) and our non- asymptotic converses.) These formulas are further evaluated and compared 
against some of the best bounds known so far, as well as the normal approximation of -R„(e) in 
[|5|. It turns out that while the normal approximation is all over the map, i.e. sometime below 
achievability and sometime above converse, the SO approximation is much more reliable as it is 
always below converses; in the meantime, the NEP approximation is the best among the three 
and always provides an accurate estimation for -Rri(e). An important implication arising from 
the Taylor-type expansion of -R„(e) is that in the practical non- asymptotic regime, the optimal 
marginal codeword symbol distribution is not necessarily a capacity achieving distribution. 

The rest of this paper is organized as follows. Non- asymptotic converses and the Taylor-type 
expansion of -Rn(e) for BIMSC and DIMC are established in Sections |ll] and III respectively. 
The SO and NEP approximation formulas are developed, numerically calculated, and compared 
against the normal approximation in Section |IV] for the binary symmetric channel (BSC), binary 
erasure channel (EEC), binary input additive Gaussian channel (BIAGC), and Z-channel. And 
finally conclusions are drawn in Section |Vj 

II. Non-Asymptotic Converse and Taylor-type Expansion: BIMSC 

Consider a BIMC {p{y\x) : x e X,y e y}, where X = {0, 1} is the channel input alphabet, 
and y is the channel output alphabet, which is arbitrary and could be discrete or continuous. 
Throughout this section, let X denote the uniform random variable on X and Y the corresponding 
channel output of the BIMC in response to X. Then the capacity (in nats) of the BIMC is 
calculated by 

CBiMC = ln2-H{X\Y) (2.1) 

where H{X\Y) is the conditional entropy of X given Y. Here and throughout the rest of the 
paper. In stands for the logarithm with base e, and all information quantities are measured in 
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nats. Further assume that the random variable — lnp(0|y) given X = and the random variable 
— lnp(l|F) given X = 1 have the same distribution, where p(0|F) {p{l\Y), respectively) denotes 
the conditional probability of X = (X = 1, respectively) given Y. Such a BIMC is called 
a binary input memoryless symmetrical channel (BIMSC). (It can be verified that BSC, BEC, 
BIAGC, and general binary input symmetric output channels all belong to the class of BIMSC.) 
Under this assumption, we have 



Pr <; -- lnp(X"|y") > H(X\Y) + 6 
n 



X'^ = a;" 



Pr 



1 



n 



lnp(X'^|F") > H{X\Y) + 5 



(2.2) 



for any G A'", where is the output of the BIMSC in response to X", the n independent 
copies of X. Throughout this paper, for any set S, we use 5*" to denote the set of all sequences 
of length n drawn from S. 



A. Definitions 

Before stating our converse channel coding theorem for the BIMSC, let us first introduce 
some definitions from [|6|. Define 



A*(X|F) ^supJa>0: j p{y) 



.xex 



dy < oo 



(2.3) 



where J dy is understood throughout this paper to be the summation over 3^ if 3^ is discrete. 
Suppose that 

A*(X|r) > . (2.4) 



Define for any 5 > 



rx\YiS) =sup 

A>0 



XiHiX\Y) + 5)-\nJ2 [ p{y)p-'^\x\y)dy 



(2.5) 



For any A G [0, A*(X|F)), let Xa and Yx be random variables under joint distribution p{x, y)fx{x, y) 
where 



fx{x,y) 



p ^{x\y) 



Further define 



5{\) ^n-^^P{Xx\Yx)]- H{X\Y) 



A*(X|F) = lim 5(A) 

AtA'(X|y) 



(2.6) 

(2.7) 
(2.8) 
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aliX\Y,\) ^Var[-lnp(X,|FA)] = E[\-\npiXx\Y,) - E[-\np{X,\Yx)]f] (2.9) 

Mh{X\Y, A) =M3[- \np{Xx\Yx)] = E[\-\np{Xx\Yx) - E[- \n p{X x\Yx)]\'] (2.10) 

and 

Mh{X\Y, A) =M3[- lnp{Xx\Yx)] = E [- \np{Xx\Yx) - E[- lnp(X,|n)]]=' (2.1 1) 

where E[-], Var[-], Ms!-], and Msf-] are respectively expectation, variance, third absolute cen- 
tral moment, and third central moment operators on random variables, and write Mi^(X|F, 0) 
as Mh{X\Y), Mh{X\Y,0) as Mh{X\Y), and ajj{X\Y,0) as aUX\Y). Clearly, (rjj{X\Y), 
Mh{X\Y), and Mh{X\Y) are the variance, third absolute central moment, and third central 
moment of —\iip(X\Y). In particular, ajj{X\Y) is referred to as the conditional information 
variance of X given F in [|6||. Assume that 

al{X\Y) > and Mh{X\Y) = Ms[- \np{X\Y)] < oo. (2.12) 

Then it follows from [6| that rx|y(5) is strictly increasing, convex, and continuously differen- 
tiable up to at least the third order inclusive over 6 E [0, A*(X|F)), and furthermore has the 
following parametric expression 

rxwiSiX)) = XiHiX\Y) + 6iX))-\nJ2 [ p{y)p-^^\x\y)dy (2.13) 
with 5{\) defined in ( |2.7| ) and A = In addition, let 

n\^<T'ir(X\Y,X) 

+ e ^ [Q {^\aH{X\Y, A)) - Q {p* + ^\aH{X\Y, A))] (2.14) 

f{X\Y,\n) ^ Q{p, + ^\aH{X\Y,X)) (2.15) 



CbeMh(X\Y,\) ^( ^ ^ _ I _ 2CbeMh{X\Y,\) 



With Q{P*) = ^^^f^ and Q(p., - , ^.|^(^|^,,) 

The significance of the above quantities related to the channel can be seen from Theorem 4 
in [|6|, summarized as below: 

(a) There exists a 6* > such that for any 6 G (0, 6*], 
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(b) For any 5 G (0, A*(X|F)) and any positive integer n 

ei^(X|F,A,n)e-""^i^(^) > Pr lnp(X"|r") > + 5 

> e^(X|F,A,n)e-"'^-i-(^), (2.17) 
where A = T^|y((5) > 0. Moreover, when 5 = o(l) and S = Q{l/y/n), 

^H{X\Y,X,n) = e ^ Q {V^XaH{X\Y, X)) {1 + o{l)) (2.18) 

^jX\Y,X,n) = e ^ Q {V^X(rH{X\Y, X)) {1 - o{l)) (2.19) 

and 

e ^ g {V^XaH{X\Y, A)) = ( j (2.20) 

with X = r'x{6) = e{6). 

(c) For any 6 < C\/—, where c < aniXlY) is a constant, 



n 



Define for any G Af", 



(5) = <{ : cx) > -- lnp(x"|i/") > i/(X|r) + 6 } (2.22) 



and 



Br,,s=U,-eA:-B{x^,6). (2.23) 



Since for any & y"', the following set 



G A"" : -- lnp(x"|?/") < //(XlF) + 5 } (2.24) 



is referred to as a BIMC jar for in [|T|, ||2|, we shall call S) the oMter mirror image of 

jar corresponding to x". Moreover, define for any set B C y"'. 



P{B) = Pi {Y'' e B} (2.25) 
P^n{B) =Ft{Y'' e BIX"" = x""} . (2.26) 
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It is easy to see that 

P,n(5(a;",5)) = Pr |--lnp(X"|r") > + 5 X" = 

[ n 

= Pr |-^lnp(X"|F") > i7(X|r) + 5| (2.27) 



where the last equahty is due to (2.2). 



B. Converse Coding Theorem 

We are now ready to state our non-asymptotic converse coding theorem for BIMSCs. 

Theorem 1. Given a BIMSC, for any channel code Cn of block length n with average word 
error probability PeiCn) = e„, 



/ 7 -2 In tn 

ln6„ - lnP(i?„,,) + In ^^^^ - In 1 + 
R{Cn) < Cbimsc - S ^ (2.28) 



n 

where 6 is the largest number such that 



2 /-21ne< 

1 + 



aH{X\Y) V n 
Moreover, the following hold: 
1) 



- j e„ < Pr lnp(X"|F") > H{X\Y) + . (2.29) 



( / - 2 In tn 

^ OBiMSC - (2.30) 

n 

where 5 is the solution to 



' ^ mIrV ^) - - V>)e— « (2.31) 

with (5(A) = S. 

2) When e„ = ^ (l - ^) for a G (0, 1), 

R{Cn) < Cbimsc - V2crH{X\Y)n''^ + 0(n-(i-")). (2.32) 

3) ^hen = ^^Jl - ^) for a > 0, 

R{Cn) < Cbimsc -a^(X|r)y^^ + o(^). (2.33) 
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4) When e„ = e satisjying e + ^ (^^6 + ^^^r^- ' ) < I, 



CbeMh{X\Y)\ 



Ine + In ^^^^ - In ( 1 + 



aj,{X\Y)n 



BIMSC 



n 



aHiX\Y)^_,f , 1 /2v^=2hr^ CBEMHiX\Y)\ 



n 



-Q- 



e + 



CB:Msc-^^g-(e) + ^ + 0(n-). 
'n n 



(2.34) 
(2.35) 

Proof: Assume that the message M is uniformly distributed in {1,2,..., e"^'^'-")}, x"(m) 
is the codeword corresponding to the message m, and „ is the conditional error probability 
given message m. Then 

e„ = E[eAf,n]. (2.36) 



Let 



M ={m: em,n < + (3n)} , 



(2.37) 



where (3n > will be specified later. By Markov inequality, 

Pr{M eM}> and > e"«('^")+i" il?^. 

Denote the decision region for message m G as Dj^. Then 

P,n(„)(5(a;"(m),5)nD™) = P,n(^)(S(x"(m), 5)) - P(E(x"(m), 5) n D^) 

> P,.n(„)(P(x"(m),5)) - e„,„ 

> P,.n(„)(P(x"(m),5))-e„(l + /30 

= Pr lnp(X"|r") > H{X\Y) + 61- e„(l + P„ 



(2.38) 



(2.39) 



where the last equahty is due to ( |2.27[ ). At this point, we select 6 such that 

Pr lnp(X"|r") > i/(X|F) + > e„,(l + 2/3„). 



Substituting ( |2.40[ ) into ( |2.39| ), we have 



(2.40) 



(2.41) 
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By the fact that are disjoint for different m and 

UmGA4 (5(a;"(m), 5) n DJ C 

we have 



(2.42) 



> 



E 



B(x"{m),5)r\Dr, 



B{x"(m),5)nDm 



p{y"'\x"'{m))p{x"{m)) 



dy'' 



> E 

ra£M 



I (m) ) e"(- ^«^^sc +s) 



B{x"{m),5)nD„ 



J2 e"(-^BiMsc+5) I p{y'^\x'^{m))dy''' 



B(x"(m),5)nD„ 



2) 



> ^ e"(-^«i^sc+5)^^g^ = |_^|en(-CBiMsc+5)^^g^ (2.43) 



where the inequality 1) is due to the definition of B{x^,6) given in p.22[ ), and the inequality 



2) follows from (2.41 1. From (2.43), it follows that 



(2.44) 



Then combining ( |2.38[ ) and ( |2.44[ ) yields 



R{Cn) < C'bIMSC — ^ 



In 



iifl;; In /3„ In e„ - In P{Bn,s) 



n n n 

By letting /3„ = ^^^yiy^ \/^^lr^> dlllSj ) and ( [Z29l ) directly come from ( [240l ) and ( [245] ). 



(2.45) 



<7^f{X|y) Y n 

1) By ( |2.17[ ) shown in [(6|, selecting 5 to be the solution to ( |2.31| ) will make ( |2.40[ ) satisfied, 



and therefore ( |2.30[ ) is proved. 
2) Towards proving ( |2.32[ ), we want to show that by making 5 = \/2aHiX\Y)n'^ — 
^^-(^-a) for some constant r], 

2 



(2.46) 
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with e„ = 
of ( |2.33| ), shown below in details. 



(l — 2^). Then the proof follows essentially the same approach as that 



3) Apply the trivial bound P{Bn,&) < 1- Then to show ( |2.33[ ), we only have to show that 
b = aH{X\Y)^^^^^ - for some constant r] can make 

1 



Pr <^ > H{X\Y) + 6 



n 

> ^^iX\Y,\,n)e 
Inn 

> |i + *v — 

> 11+ ' 



-nrjf|y{<5) 



n 



aH{X\Y) 



2\/TTa\nn 



—2 In er 



n 



1 



2a Inn 



satisfied, where A = r'-^y^{5) and 



(Th{X\YA) 



-21ne„ 



n 







Inn 



n 



< ^70 



Inn 



n 



for some constant r^o- Towards this, recall (2.16) (2.19) and (2.20), 



-nrx\Y{S) 



2ajj(X\Y) ^ 



2a In n r] In n ^ 

n n- J 



(J\^|y-^^/ 2Q:lnn ZZlEli . / In^ n 

/ 2a In^ n q 



^^^^^ — m)\r-4^ 



for some constant rji, and 
e^(X|y,A,n) 

Q (p* + V^XaniXlY, A)) 



> e" 



{p, + ^\au{X\Y,X))^ 

e 2 



p|+2p,y?iACTg(X|y,A) 

e 2 



27r(p, + v^Aa/f(X|r,A)) 
1 - 



(2.47) 



(2.48) 



(2.49) 



{p, + ,/^XaH{X\Y,X)y 



> 



> 



27r{p, + ^XaH{X\Y,X)) 
1 / 1 



+ V^Aa^(X|r,A))2 



2\/TTa\nn 
1 



2Vvraln 



n 



2a Inn 
1 

2a In 



1-0 



Inn 



n 



1 - 



Inn 



n 



(2.50) 
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for another constant r]2, where = Q-^ (^^ - ^^^f^^^ = © (7;^), and we utihze 



the fact that 



A = r'iS) 



6 



aH{X\Y,\) 



aH{X\Y)±0{X). 



Then (2.47 1 is satisfied by choosing a constant 77 such that 



f \/2ar] 



-Vi 



1 - 



Inn 



n 



> 



1 + 



2ari 



CTHiX\Y) 



- Vi 



n 



1 - 



Inn 



n 



for some constants r^o, ?7i and 772- 



4) According to ( |2.40[ ), we should select 5 such that 

1 



Pr 



n 



lnp(X"|r") > H{X\Y) + 6\> 1 + 



crH{X\Y) 



Then by (2.21) 



n 



Inn 



n 



-21ne 



n 



1 /2V-21ne Mj^(X|F)\ 



e. 



(2.51) 
(2.52) 



(2.53) 



(2.54) 



(2.55) 



will guarantee ( |2.54[ ). Consequently, ( |2.34[ ) is proved by substituting ( |2.55[ ) and 



into ( |2.45| ) and applying the trivial bound P{Bn,5) < 1, and ( |2.35| ) follows the fact that 



Q 



-1 



1 / 2v/^2h^ CbeMh{X\Y) \ 
~ ■ ^UX\Y) ) 



^ \aH{X\Y) 



Q 



-1/ 



o 



(2.56) 



Remark 1. It is clear that the above converse proof technique depends heavily on the concept 
of the outer mirror image of jar corresponding to codewords. To facilitate its future reference, 
it is beneficial to loosely call such a converse proof technique the outer mirror image of jar. 
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Remark 2. In general, the evaluation of P{B„ s) may not be feasible, in which case the trivial 
bound P{Bn,5) < 1 can be applied without affecting the second order performance in the non- 
exponential error probability regime, as shown above. However, there are cases where P{Bn,5) 



can be tightly bounded (e.g. BEC, shown in section IV). 



Remark 3. For the bound ( |2.34 ), when e is small with respect to '"^j^^j^j,^-' (the estimation 



error that comes from Berry-Esseen central limit theorem) will be dominant; in this case, ( |2.34| ) 
is loose. 



Remark 4. The choice (3„ 



1 



aH{X\Y) 



-2 In En 



■0 



in the proof of Theorem\l\is not arbitrary. Actually, 



it is optimal when 5 is small in the sense of minimizing the upper bound (2.45) in which 5 



depends on (3n through ( |2.40| ). To derive the expression for f3n, the following approximations 
can be adopted when 6 is small: 

2P^aUX\Y) 



In 



d6 

d/3n 



n6 

-2aUX\Y) lne„ 



n 



l + /3n 



ln/5„ 



(2.57) 

(2.58) 
(2.59) 



where (2.57) and (2.58 1 can be developed from (2.16) and (2.17). 



By reviewing the proof of Theorem [T| it is not hard to reach the following corollary. 

Corollary 1. Given a BIMSC, for any channel code Cn of block length n with maximum error 
probability Pm{Cn) = 



R{Cn) < CbimSC — 5 — 



n 



where 5 is the largest number such that 



(2.60) 



(2.61) 



Moreover, the following hold: 
1) 



R{Cn) ^ C'bIMSC — 5 — 



In e„ + In —7^ 



<th(X\Y)\J_ n_ 

n 



(2.62) 
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where 6 is the solution to 



1 + 



-2 In Er 



aH{X\Y) V n 
with 5(A) = 5. 

2) When e„ = e satisfying e + ^ (;^e + ^^^^) < 1, 



BIMSC 



In 6 + In ,l,^, J=^ 



n 



BIMSC 



n \ sfii \a„{X\Yr ■ <rl,(.Y|y') J 

n In 



Remarks [2[ [3] and |4] also apply to Corollary [T| 



(2.63) 



(2.64) 
(2.65) 



C. Taylor-type Expansion 

Fix a BIMSC. For any block length n and average error probability e, let -R„(e) be the best 
coding rate achievable with block length n and average error probability < e, i.e., 

i?„(e) =max{_R(C„) : C„ is a channel code of block length n with PeiCn) < e}. (2.66) 

In this subsection, we combine the non-asymptotic achievability given in ( |1.1[ ) ( |1.2| ) with the 



non- asymptotic converses given in ( |2.28[ ) to ( |2.31[ ) to derive a Taylor-type expansion of -R„(e) 
in the non-asymptotic regime where both n and e are finite. As mentioned early, when both n 
and e are finite, what really matters is the relative magnitude of e and n. As such, we begin 
with introducing a quantity 5ri(e) to measure the relative magnitude of e and n with respect to 
the given BIMSC. 

A close look at the non-asymptotic achievability given in ( |1.1| ) ( |1.2| ) and the non- asymptotic 



converses given in (2.28) to (2.31 ) reveals that 



Pr I lnp(X"|y") > H(X\Y) + 5 

n 



is crucial in both cases. According to ( |2.18[ ) and ( |2.19[ ) 

1 



Pr 



n 



lnp(X"|y") > H{X\Y) + 6 



\'^<y%(X\Y,\) 



Q{^\aH{X\Y,\))e 



9X\Y,n{5) 



(2.67) 
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where A = r'-^yfi.^)- Consequently, we would like to define 5„(e) as the solution to 

gx\Y,n{^) = e (2.68) 
given n and e < 1/2, where the uniqueness of the solution in certain range is shown in Lemma [T] 

Lemma 1. There exists 5^ > Q such that for any n > 0, gx\Y,n{.^) is <^ strictly decreasing 
function of 5 over 5 E [0, 6^]. 



Proof: Since A = r'^^^yi^)' follows from ( |2.7| ) and ( |2.13[ ) that gx\Y,n{S) = gx\Y,n{S{X)) is 
a function of A through 6 = S{X). (For details about the properties of 5(A) and rx|y(5), please 
see Q.) Moreover, by the fact that 5(0) = and 5(A) is a strictly increasing function of A, 
the proof of this lemma is yielded by analyzing the derivative of gx\Y,niS{X)) with respect to A 
around A = 0. Towards this, 
rffi'x|y,n(5(A)) 



dX 
d 

dX 
— e 



nX'^tT'jj{X\Y,\) 



Q{^XaH{X\YA)))e 



-?irjf|y{5(A)) 



i.\'^a'jj(X\Y,X) 



Q{^XaH{X\Y,X))e 



-nrx|y(5(A)) 



d 

dX 



{nrx\Y{6{X))) 



-nrjf|y(5(A)) 



xe 2 Q{x) 



dx ^ drx\Y{S) 



5=5{X) 



where x = ^/nXaHiX\Y, A). On one hand. 

dx ^( daH{.X\ Y,X) 
- = v^^MX|r,A) + A 

X dal{X\Y,X) 



V^(^h{x\y,x) + x'- 

v/^faH(X|r,A) + — 



On the other hand. 



drx\Y{^) 



d6 



5=S{X) 

dS{X) 
dX 



{X\Y,X) dX 



X 



crUX\Y,X) 



which further implies 

e 2 Q[x)n 



drx\Y{S) 



d6 



d6{X) 



6=6{X) 



dX 



e'^Q{x)nXal{X\Y,X) 
naH{X\Y, X)xe^Q{x) 



(2.70) 

(2.71) 
(2.72) 



(2.73) 
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Substituting ( |Z70l ) and (jZTSj) into ( |239l ), we have 

c?fl'x|y,n(5(A)) 



-nrx|y(<5{A)) 



^ ^ / 

xe 2 



do-|,(X|y,A) 
dX 



V^aHiX\Y,X) 



2aH{X\Y,X) 



-nrx|y(<5(A)) 



2^ 



A 



dG\{X\Y,\) 



dX 



2aUX\Y,X) 



1 ^ . (2.74) 



Note that 



^2 ^2 ][ ^2 

27ra;e^(5(a;) < V27rxe^ , 



27ra; 



1. 



If 



do-|f{X|y,A) 
dA 



> 0, then 



27^X6 2 Q{x) 



. dajjiX\Y,\) 
d\ 

2aUX\Y,X) 



< 0, 



which further implies that 



dgx\Y,n{SW) 



dX 



< 0. In the meantime, if '^''^^^'^'^^ < 0, 



2-Kxe 2 Q{x) 



^ dajj(X\Y,X) 
^ dX 

2aUX\Y,X) 



< 



27rxe 2 



1 



X 



/2^(1 + x2) 

. da%(X\Y,X) 
dX 



-e 2—1 



dajj{X\Y,X) 



dX 



2al{X\Y,X) 



- 1 



1 + x^ \ 2aUX\Y,X) 
X 



- 1 



dcrjj{X\Y,X) 



dX 



(2.75) 



(2.76) 



(2.77) 



2al{X\Y,X){l + nX^aUX\Y,X)) ' 
To continue, let us evaluate ^f^Iai^l^i^. From ( |2.6[ ), ( |2.7| ), and ( |2.9[ ), it is not hard to verify that 



dal{X\Y,X) 



fp{x,y)^^^\n'p{x\y)dy-2aUX\Y,X)iHiX) + 6{X)) (2.78) 



dX 



where 



dfxix,y) 
dX 



dX 



\npix\y)-iHiX\Y) + 5iX))]Mx,y). 



(2.79) 
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Plugging ( |2779l ) into ( |2J8] ) yields 

daUX\Y,X) 



E (- in=^p(XA|rA)) - 34(x|r, A)(if(x|r) + 5) - {h{x\y) + 

MH(X|r,A). (2.80) 



Combining ( |2.74D , p.76D , ( |2.77D , and p.SOD together, we have 

C^5'X|Y,n('5(A)) 



< g-"''x|y{'5(A)) 

< g-'i^x|i'('5(A)) 



na H{X\Y,X) 
na^,(X|F,A) 



AMH(X|r,A) 



/27r 



2al{X\YA){l + n\^al{X\Y,X)) 
\Mh{X\Y,\) 



2al{X\Y,\) 



1 (2.81) 



(2.82) 



In view of the continuity of cr|^(X|F, A) and Mh{X\Y,\) as functions of A, it is easy to see 
that there is a A+ > such that for any A G [0, A"*"], 

\Mh{X\Y,\) 



and hence 



2al{X\Y,\) 

dgx\Y,ni^W) 

dX 



1 < 



< 



for any n > 0. This completes the proof of Lemma [T] with 5+ = 5{X^). 



Remark 5. From ( |2.81| ), it is clear that when n is large, 

\Mh{X\Y,\) 



and hence 



2a|,(X|y,A)(l + nAV|(X|r,A)) 

dgx\Y,n{5{>^)) 



- 1 < 



d\ 



< 



even for A > A+. Nonetheless, as can be seen later, we are concerned only with the case where 
5„ (e) is around 0. Consequently, the exact value of 6^ is not important to us. 



Remark 6. In view of Lemma^and the definition of 5n{^) in ( |2.67 1 and ( 2.68| ), it follows that 
^n{.\) = 0/or any n and any BIMSC. However, when e < 1/2, 5ri(e) depends not only on n and 
e, but also on the BIMSC itself through the function rx|y((5)- Given n and e < 1/2, the value of 
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5„(e) fluctuates a lot from one BIMSC to another through the behavior o/rx|y (5) around 5 = 0, 
which depends on both the second and third order derivatives o/rx|y(5)- Given a BIMSC, if 
rx\Y{5) is approximated as in ( |2.16[ ), then 5„(e) is in the order of sj^^- Of course, such an 
approximation is accurate only when 5 or \/— ^ is sufficiently small. 



With respect to 5n{^), Rn{^) has a nice Taylor-type expansion, as shown in Theorem |2j 
Theorem 2. Given a BIMSC, for any n and e satisfying gx\Y,n{S^ f^) < e < 1, 

\Rn{e) - (Cbimsc - W)\ <o{6n{e)) 



where 



if € < \, and 



RJe) 



o(5„(e)) = rx|y((5„(e)) + 
aH{X\Y) 



Inn + di 



n 



BIMSC 



< 



Inn + d2 



n 



(2.83) 



(2.84) 



(2.85) 



otherwise, where di and d2 are channel parameters independent of both n and e. 



Proof: When e > ^, ( |2.85| ) can be easily proved by combining ( |1.5[ ), ( |1.6[ ) and ( |2.34[ ). 
Therefore, it suffices for us to show ( |2.83[ ) and ( |2.84[ ) for e < |. By ( |1.1| ) and definition of 
^//(X|F, A, n), for any BIMSC there exists a channel code C„ such that 



PeiCn) < {^HiX\Y,\,n) + 



2(1 -CBE)MHiX\Y,\) 
naf,iX\Y,X) 



-nr jf|y(<5) 



and 



In 



R{Cn) > C'bIMSC — S + 

which implies that for any 5 such that 



2{1-Cbe)Mh{X\Y,X) nrxiv(S) 
V^a%{X\Y,X) ^ 



gX\Y,n{^) + 



2Mh{X\Y,X) 

^aUX\Y,Xy 



n 



-nr_y|y(5) 



< e 



(2.86) 



(2.87) 



(2.88) 



the following inequality holds 



In 



-Rn(e) > C'bimsc — 5 + 



2{1-Cbe)Mh{X\Y,X) nr^^viS] 
VEa%(X\Y,\) 

n 



(2.89) 
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where A = r^|y((5). Now let 5 = (5„(e) + ^ for some constant 77 > 0, which will be specified 
later, and A = r'^^^yi^)- convexity of rx|y(5), 

rx\Y{S) > rxiY{6n{e)) + A„(e)^ (2.90) 
where A„(e) = r^|y ((5,,(e)). Then 

VncrH{X\Y,X) 



v^ff|,(.Y|r,A); 



2MH(X|y,A) 



e g {^XaniXlY, A)) 



2) 
< 



2MH(X|y,A)v/2^A (l + „,..^;^|y,,) ) ' 
^+ af,(X|F,A) 



X e " ' <3 (ySA„(e)ffH(.Y|F, A„(£))) e-'.v|r(J,.(.))-.A.(.) 

/ 2V2iM„{X\Y, A) (1 + _ 
.x,.„ (.„(.)) e-'.> + _^^_|^_^U^A 



, 2v/2^M.(X|y.A)(l+-,,^) ^ ^ 

In the derivation of ( |2.91[ ), the inequality 1) is due to ( |2.90[ ); the inequality 2) follows from the 
fact that e 2 Q{x) is a strictly decreasing function of x, A(Jh(X|F, A) is strictly increasing with 
respect to A as shown below 

= an{X\Y,X) + X 

^ ^-^^"^''^[^ + '^^M%A)j 

n (^\V\^(l I , Mh{X\Y,X) \ 
= aH{X\Y,X) [^^ + \^2^^x^Y,X)) 

> (2.92) 
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for A e [0,A+], and 



the equality 3) is attributable to 



6^2 Q{x) > 



X 



A = A„(e) + 



d6 



x=x 



n 



27r(l + a;2) 



An(e) + 



al{X\Y,\) 



n 



for some A G [A„(e), A]; and finally, the inequahty 4) follows from the inequality 

>l + x + — 



for any x > 0. In order to satisfy ( |2.88[ ), let us now choose t] such that 

2V2iMn{X\Yr\) (l + nx^^xw,.) 



and 



I.e. 



aUX\Y,X) 



A„, e 



{X\Y,X) 



aUX\Y,X) 



aUX\Y,X)n 



r(X\Y,X) 



aUXlYA) 



max < 1, 



nXUe)aUX\Y,X) 



To see r] is bounded, note that '^^flxiYX) always bounded for A G [0, A+]. On the other hand, for 



(2.93) 



(2.94) 



(2.95) 



(2.96) 



(2.97) 



AX\Y,X) 



e < ^, -\/^A„(e)(7i^(X|F, A„(e)) > c for some constant c, as -\/nA„(e)cr//(X|F, A„(e)) — )■ im- 



plies that e = gx\Y,niSni^)) — ^ ^, and the same argument can be applied to ^/nXn{e)a'jJ{X\Y, A). 
Therefore, 

'MH{x\Y,xy 



77 < 2v27r max 

Ae[0,A+] 



aUX\Y,X) 



(l + c"2)max{l,2c"2}. 



(2.98) 
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Then combining ^IM\ ), ^2l9^ , ^L90^ , ^Im\ , and ^2M\ ) yields 

In 



-6 + 



2{1-Cbe)Mh{X\Y,X) ^-nrxl^^(5) 



Rn{^) > C'bIMSC 

= C'bIMSC - 5 - rx|y(5) + 
1) 



^a'jj{X\Y,\) 



In 



n 

2{1-Cbe)Mh{X\y;\) 
aUX\Y,X) 



— I Inn 



n 



In 



> C'bIMSC - '5n(e) - rx|y((5n(e)) - A- + 

> Cbimsc 



2{1-Cbe)Mh{X\y;\) 
'h 



a%{X\Y,X) 



— I In n 



n 



n 



-X^rj + In 



+ 



2(1 - Cbe) minA 



/ 2Afg(X|y,A) 

I, a|,(x|y,A) 



?7 — Mn n 



Mnn + (ii 

Cbimsc - 5n(e) - rx|y(5n(e)) , 

n 



(2.99) 



where di is independent of both n and e. In the derivation of ( |2.99| ), the inequality 1) follows 
from the convexity of rx|y(5) and the fact that 

- Tj 

rx\Y{^) < rx\Y{^n{t)) + A-. 

n 

We now proceed to establish an upper bound on Rn{t). Towards this end, recall ( |2.30| ) and 
( |2.31[ ) where we make a small modification by choosing /3„ = A = ?"x|y(^) ™ proof of 
Theorem [TJ Then for any 5 such that 

(l + 2A)e<e^(X|y,A,n)e-"'--i-(^) 

we have 

In e - In P(5„,5) + 2 In A - In (1 + A) 



(2.100) 



RJe) < Ci 



BIMSC 



< Ci 



BIMSC 



5 



5 + 



n 



-Ine - 21nA + A 



n 



(2.101) 



where the trivial bound P{Bn,5) < 1 is applied. Now let S = 6n{e) — ^ for some constant rj' > 0, 



April 15, 2012 



DRAFT 



23 



which will be specified later, and A = r^|y(5). Then 



1) nX^ajj^XlY.X) 



> e 



n\^a'jj(X\Y.X) 



QWnXaH[X\Y,\)) 



I g(p.V^AM^ ^^^^^^ 

In the derivation of ( |2.102| ), the inequality 1) is due to the convexity of rx|y(5) and the fact that 



rx\Y{^ < rx\Y (Sni^)) - A 



n 



the inequality 2) follows again from the fact that e^Q{x) is a strictly decreasing function of x 
and \aH{X\Y, A) is increasing with respect to A; and finally the inequality 3) is attributable to 
the inequality > 1 + x for any a; > 0. 



In order for ( |2.100[ ) to be satisfied, we now choose r]' such that 
, ^ 2+iln ^(^/^Aa^(X|F,A)) 



A Q{p, + ,/^XaH{X\Y,X)) 



2+ - In 
A 



where < p < p*. One can verify that 



(p+^Xafj{X\Y,X)y 



1 + P* 



2lT 



Qip. + y/^XaHiX\Y,X)) 



(2.103) 



^ {p+VEXaff(X\Y,X)f 



r]' < 2 



< 2 



P* v^' 



XQ{p, + V^XaH{X\Y,X)) 



p* 1 + (p. + VnXaHiX\Y,X))\ ^^^^^j^\Y,x)(p,- 



-P)- 



(2.104) 



A p^ + ^XaH{X\Y,X) 
where the last inequality is due to ( |2.93| ). From the definition of p*, it is not hard to see that 
p^: = ^ for some constant rj" depending only on channel parameters. Meanwhile, we have 
i/nA(Ji^(X|F, A) > c as discussed above. Then 



7]' < 2 + 



V" 



nX 



n 



V^XaH{X\Y,X)^ e''"^^'^^^^^ioM]-«(^\Y'^)+'-^ 



< 2 + (c"2 + c-i^" + ^1 



max ouiXW^X) 

Ae[0,A+] 



(2.105) 
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which is independent of both n and e. Now combining (|2.102|) and (|2.103|), we have 



e^(X|r,A,n)e-'^^-i-® >(1 + 2A)6 



(2.106) 



and consequently, 



< C, 

1) 

< C, 



BIMSC 



5 + 



- In e - 2 In A + A 



n 



BIMSC 



8n{€) +rx\Y{5n{^)) 



In 



+ 



2Ti^\n{e)aH{X\Y,Xn{e)){l + 



^\l{e)al{X\Y,K{e)) 



n 



+ 



-21nA + A+ + r/' 



n 



Cbimsc - '^n(e) + rx\Y{^n{t)) + 
Inn + ] 

+ 



1^ {} + n\l{e),jllx\Y, 



An(e)) 



Inn + In v^ctj^ (X|y, A„(e)) + In ^ - In v^A + A+ + V 



n 

^ e- / X /r / XX Inn + 

< Cbimsc - cinie) + rx|y(()n(e)) H 

n 



(2.107) 



where is another constant depending only on the channel. In the derivation of ( |2.107[ ), the 
inequality 1) is due to ( |2.93| ) and the definition of 5„(e) in ( |2.68| ); and the inequality 2) follows 
from the fact that 



An(e) 
A 



1 



1 



(x|r,A) 



cr 



H 



for some A G [A, A„(e)] and 



n\aH{X\Y,\) > c. 



Then the theorem is proved by combining ( |2.99[ ) and ( |2.107| ) and making di = max{di,di}. 



Remark 7. The condition e < | for ( 2.83 ) and ( 2.84| ) can be relaxed as we only require that 
A/n5„(e) or equivalently ^/nX be lower bounded by a constant, which is true when e < d for 
any constant d < ^. In addition, when e < gx\Y,n{S^ f^), e is exponential function of n, in 
which case the maximum achievable rate is below the channel capacity by a positive constant 
even when n goes to oo. As such, from a practical point of view, the case e < (yfx|y,n(^^/2) is 
not interesting, especially when one can approach the channel capacity very closely as shown 
in the achievability given in ( |1.1[ ) and ( |1.2[ ). 
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Remark 8. In the definition of Rn{^), the average error probability is used. If the maximal 
error probability is used instead, Theorem |2] remains valid. This can be proved similarly by 
first using the standard technique of removing bad codewords from the code in the achievability 
given in and ( 1.2) to establish similar achievability with maximal error probability and 



then combining it with Corollary [7] 

Remark 9. In view of Theorem |2] it is now clear that jar decoding is indeed optimal up to 
the second order coding performance in the non-asymptotical regime. Since the achievability 
given in and \\.2) was established for linear block codes, it follows from Theorem |2] that 
linear block coding is also optimal up to the second order coding performance in the non- 
asymptotical regime for any BIMSC. In addition, in the Taylor-type expansion of Rn{^), the 
third order term is 0((5^(e)) whenever (5„(e) = Vl{^J\YLn/n) since it follows from ( |2.16| ) that 
rx|y(5n(e)) = 0{5l{e)). 

D. Comparison with Asymptotic Analysis 

It is instructive to compare Theorem [2] with the second order asymptotic performance analysis 
as n goes to oo. 

Asymptotic analysis with constant < e < 1 and n — )■ oo: Fix < e < 1. It was shown in 
(H, [|4|> lIH that for a BIMSC with a discrete output alphabet 

Rn{e) = Cbimsc - ^^^^Q-^(.) + 0^-^) (2.108) 



for sufficiently large n. The expression Cbimsc ~ was referred to as the normal 

approximation for -R„(e). Clearly, when e > 1/3, ( |2.108[ ) is essentially the same as ( |2.85[ ). Let 



us now look at the case e < 1/3. In this case, by using the Taylor expansion of rx|y((5) around 
6 = 



1 , -Af«(.Y|y),3 , 



dajj{X\Y,\) 




dX 


A=0 



it can be verified that 



S„(,)^^^!i^Q-\,)+o('-). (2.110) 
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Thus the Taylor-type expansion of -R„(e) in Theorem |2] implies the second order asymptotic 



analysis with constant < e < 1 and — )■ oo shown in ( |2.108[ ). 

Asymptotic analysis with ri — )■ oo and non-exponentially decaying e: Suppose now e is a 
function of n and goes to as n — t- oo, but at a non-exponential speed. In this case, as — )■ oo, 
6n{e) goes to at the speed of 6 (^\/~^^' ^^'^ ^A^A„(e) goes to oo. By ignoring the third and 
higher order terms in the Taylor expansion of rx|y((5), one has the following approximations: 

gxivAW) ^ V^,^. e""^^ (2.111) 

V2^T^/nXn{e)aH{X\Y, A„(e)) 



and 

Q{x) ~ e 2 for large x. 



By these approximations, it is not hard to verify that in this case 



lim — TTTTT^ = 1. 



Therefore, from Theorem |2| it follows that when e goes to at a non-exponential speed as 
n — 7- oo, °"-f^(^l'^^ Q-i(g) is still the second order term of -R„(e) in the asymptotic analysis with 



n — )• oo. Indeed, this can also be verified by looking at the specific case given by (1.3), (1.4), 



and ( |2.33[ ) when e goes to at a polynomial speed as n — )• oo. To the best of our knowledge, 
the second order asymptotic analysis with — )■ oo and non-exponentially decaying e has not 
been addressed before in the literature. 

Divergence of 5n{e) from ^^^^^^Q-i(e): The agreement between 5„(e) and ^^^^^^Q~\e) 
terminates when the third order term 

-Mh{X\Y) 
6aUX\Y) 

in the Taylor expansion of rx|y((5) shown in ( |2.109[ ) can not be ignored. This happens when 5 
is not small, which is typical in practice for finite block length n, or 

A -M^(x|y) 

is large. In this case, ^-^^^^Q^^(e) will be smaller than 5n(e) by a relatively large margin 
if Cx\Y < 0, and larger than 5„(e) by a relatively large margin if (^x\y > 0. As such, the 
normal approximation would fail to provide a reasonable estimate for Rn{^)- This will be further 



confirmed by numerical results shown in Section IV for well known channels such as the BEC, 
BSC, and BIAGN for finite n. 
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III. Non-Asymptotic Converse and Taylor-Type Expansion: DIMC 

We now extend Theorems [T] and [2] to the case of DIMC P = {p{y\x),x E X,y E y}, where 
X is discrete, but 3^ is arbitrary (discrete or continuous). 



A. Definitions 

Let V denote the set of all distributions over X. Let Vn denote the set of types on Af" with 
denominator n [|7|, and be the type of x". Moreover, for t E Vn, let 



A"," ={x" E A'" : t(a;") = t}. 



(3.1) 



Before stating our converse channel coding theorem for DIMC, we again need to introduce some 
definitions from [6|. For any t E V, define 



1=1 



where 



Qtiy) 



dy 



and 



X*_{t;P) =sup|A>0:5^t(a) f p{y\ 



piy\a) 



-A 



dy < oo 



(3.2) 

(3.3) 
(3.4) 

(3.5) 



It is easy to see that \*_{t] P) is the same for dX\ t E V with the same support set {a E X : 
t{a) > 0}. Suppose that 

X*_{t;P)>0. (3.6) 



Define for any t E V and any 5 >0 



r-{t, 5) = sup 

A>0 



A(5-/(t;P))-5^t(x)ln j p{y\ 



X] 



p{.y\x) 
, (it{y) 



dy 



(3.7) 



and for any t E V and any A E [0, A!L(t; P)), random variables Xt and Yt^x with joint distribution 

t{x)p{y\x)f^xiy\x) where 



f-x{y\x) 



A 


p{y\x) 


-A 


Jp{v\ 


x) 


p{v\x) 

_ qt{v) _ 


-A 

dv 



(3.8) 
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Then define 



D{t,x,X) =E 



In 



p(yt,x\Xt) 



-In 



qt{Yt,x) 



Xt = x 



A*_(t)^ lim 5Jt,X) 

AtAl (t;P) 



(3.9) 

(3.10) 
(3.11) 



al_{t;P,X) = E<^Var 



In 



= ^t{x)'Vai 



qtiYt,x 



In 



Mz,,_(t;P,A) = E<^M3 



In 



piyt,x\Xt) 



X, = x 



(3.12) 



X 



Y.t{x)m^ 



qt{Yt,x 
P{Yt,x\Xt) 



In 



and 



M^,_(t;P,A) = Emg 



In 



qt{yt,\ 

p{yux\Xt) 



Xt = x 



(3.13) 



X 



P(>^t,A|X) 



In 



qt{Yt,x) 



X, = x 



(3.14) 



Note that (jfj P, A), Mr)-{t]P,X), and MD_{t;P,X) are respectively the conditional vari- 
ance, conditional third absolute central moment, and conditional third central moment of In '^gf^y^^^)*'' 
given X. Write al_{t; P, 0) simply as al{t; P), Mn-it; P, 0) as MD(t; P), and MD,-{t; P, 0) 
as Mriit-^P). Assume that 

(7^(t; P) > and Moit; P) < oo. (3.15) 
Furthermore r_(t,5) has the following parametric expression 

'p{y\xy^^ 



^4t, 54t, A)) = A(5_(t, A) - lit; P)) - J] In f piy\x) 



qt{y) 



dy (3.16) 



with A 



gg'^^ satisfying 6-{t, A) = 5. In addition, let 

t P X ^ A2CBEMD,_(t;P,A) 



kA^ctI, _(t;P,A) 



V^a3__(t;P,A) 

+ e [g(^/^Aa^,_(i; P, A)) - Q{p* + A/^A(T^,_(i; P, A))] (3.17) 

. nA^af, _(t;P,A) 

e^_(t;P,A,n) -e ^ + V^Aaz,,-(i; P, A)) (3.18) 
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with Q{p* 



CBEMu,-it;P,\) 



and (5(p* 



1 2CBEMo,-it;P,\) 



. Similar to the case in Section 



the purpose of introducing above definitions is to utilize the following results, proved as 
Theorem 8 in [[6|, which are valid for any t E Vn satisfying ( |3.6[ ) and p.l5| ). 

(a) There exists a 6* > such that for any 6 G (0, 6*] 

1 



r_(t,5) 



-6' + 0{6'). 



(b) For any 5 E (0, A*_{t)), and any E X^, 



(3.19) 



(3.20) 



where A = ^'^'q^'^^ > 0, and = Y1Y2 ■ ■ ■ Yn is the output of the DIMC in response to 
an independent and identically distributed (IID) input X" = X1X2 ■ ■ ■ X„, the common 
distribution of each Xi having X as its support set. Moreover, when 5 = o(l) and 

6 = ^](l/v^), 

eD,-(t;P,A,n) = e f Q(v^AaB,-(t;P,A))(l + o(l)) (3.21) 

nA^CT^, _(t;P,A) 

e^ Jt;P,A,n) = e f Q (V^Aaz5,-(i; ^, A)) (1 - o(l)) (3.22) 



ia2(t|, _{t;P,A) 



1 



nA 



e ^ g (v^A(Ti5,_(t; P, A)) = 

wzY/z A = r'x{6) = 6(5). 
(c) For any 6 < c^/-^, where c < aoit; P) is a constant, and x" E XJ^, 



(3.23) 



_ CBEMnit;P) ri p(nX") 



< Q 



6\/n 



+ 



CsEMoit-P) 



aoit-P)] ' ^alit-P) • 
Turn our attention to sequences in y^. For any t E Vn and any x" G A"", define 

B,{x\ 6) ^ : -00 < - In ^^^J^ < J(t; P) - 5 



n qt{y'') 



(3.24) 



(3.25) 



and 



\s = PABtix^,S)) 

1 



Pr <^ - In 



< /(t; P) - (5 



(3.26) 
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where Pt^s only depends on type t and 6. Since for any ?/" G 3^", the following set 

e A*," : - In Ei^Jll^ > /(t; P)-5\ (3.27) 



is referred to as a DIMC jar for based on type t in [|T|, we shall call Bt{x'^,S) the OMfer 
mirror image of jar corresponding to s". Further define 

Pw =U,"e;tr (3.28) 
P{Bt,n,&) = I qt{y'')dy\ (3.29) 

B. Converse Coding Theorem 

For any channel code C„ of block length n with average word error probability Pe{Cn) = e„, 
assume that the message M is uniformly distributed in {1, 2, . . . , e"^'^''"^}. Let x"'{m) be the 
codeword corresponding to the message m, and „ the conditional error probability given 
message m. Then 

en = E[eM,n]. (3.30) 



Let/3„ = ,/^^ and 



M = {m : em,n < en{l + (3n)} . (3.31) 



Consider a type t eVn such that 

|{m G : t(x-(m)) = t}| > J/^^y^y (3.32) 

Here and throughout the paper, 151 denotes the cardinality of a finite set S. Since \Vn\ < 
(n + l)l'^l, it follows from the pigeonhole principle that such a type t E Vn exists. In other 
words, if we classify codewords in {x"(m) : m E Ai} according to their types, then there is at 
least one type t E Vn such that the number of codewords in {x''^{m) : m E M} with that type 
is not less than the average. 

We are now ready to state our converse theorem for DIMC. 

Theorem 3. Given a DIMC, for any channel code Cn of block length n with average word error 
probability Pe(C„) = e„, 

R{Cn) < /(t;P)-5-^^^^^^^^^^^^ + |;^|^"^^ + '^ 



n n 



In - In (^1 + 



-2 In en 



n 



(3.33) 
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for any t EVn satisfying p.32[), where 5 is the largest number satisfying 



1 + 2a/^^ \en<Pt.5. 



n 



(3.34) 



Moreover, if a type t & Vn satisfying p.32[ ) also satisfies p.6[ ) and p.l5[ ), f/ze following 
hold: 

1) 



n 



n 



In - In (^1 + 



-2 In en 



77, 



where 6 is the solution to 



1 + 2 



-21ne 



n 



(3.35) 



(3.36) 



with 6-{t, A) = 6. 
2) W^/^en e„ = ^ (1 - ^) for a e (0, 1), 



3) When 



i?(C„) < I{t; P) - y2aB(t; P)n-'^ + 0(n-(i-")). 
, (1 - /or a > a 

2^™ Inn V 2alnn/ 



N / ^ / , ,2a; Inn ^ /\nn 
R{Cn) < I{t-P)-aD{t-P)sl + 



n 



n 



(3.37) 



(3.38) 



4) When e„ = e satisfying e + ^ (2ev/^2lEI + < 1, 

,, ^ , Inn Ine 



n n 



lit; P) - ^-^Q-^ (e) + (1^1 + 1)^ + O(n-) 



(3.39) 
(3.40) 

Proof We again apply the outer mirror image of jar converse-proof technique. By Markov 
inequality, 

(3.41) 



Pr{M eM}> and > g^^^'^-^+i" il^ 



For any t E Vn satisfying p.32| ), let 



Mt ={m: e„,„ < e„,(l + /3„), t(x"(m)) = t} 



(3.42) 
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lA/< I > l-^l > n/?(C„)+ln^-|A'|ln(n+l) 

\^ - ^n + l)\^\ - 
Denote the decision region for message m E M.t as D„i- Now for any m E Ait 

At this point, we select 5 such that for any G .Y"' 



Substituting p.45[ ) into p.44[ ), we have 

P^n(^)(Pt(a;"(m),5) n D„) > /3„e„. 
By the fact that Dm are disjoint for different m and 

U^eA^, (/^m n Pt(x"(m), (5)) C Pi^„_5, 



we have 



Bt.n,S 

= ^ e-"(^(*^^)-'^)p,.(™)(P(a;"(m),5) n D^) 

m£Mt 



which implies that 



\Mt\ < e 



n(/(t;P)-5)-ln/3„-ln£„+lnP(St,„,i) 



April 15, 2012 



33 



Then combining p.43[ ) and p.49[ ) yields 



R{Cn) < I{t; P)-6- 



ln/3„ ln(n + l) 

r M- • 



n 



n 



n 



n 



(3.50) 



Since /3„, 



-2\j^ by definition, ( [333] ) and ( |334l ) directly come from ( |330l ) and ( |345] ). 



1) According to ( |3.20[ ), it can be seen that selecting 5 to be the solution to p.36| ) will suffice 
p.45[ ). Consequently, p.35[ ) is proved. 

2) The proof is essentially the same as that for part 2) of Theorem [T] where we can show 
that 

(3.51) 



when e„ = ^^/=^ (l — and 5 = \p2oY){t] P)n~^ — rju'^^^"'^ for some constant r/. 
3) Apply the trivial bound P(Bt n,5) < 1- Then similar to the proof for part 3) of Theorem 
|l| one can verify that by making 6 = crD _(t; P)- 
constant rj. 



V 



In n 



for some properly chosen 



Pt,s > LJi;^, 



dr_{t,5) 



n e 



-nr- {t,S) 



> 



1 + 2 



— 21ner 



n 



(3.52) 



for e 



" 2VTTa ln» 



= (1 - 2^). where (|3J9]), ( [X22l ) and ( [X23] ) are utilized. 



4) According to p.45| ), we should select 6 such that 

Pt,s>\ 1 + 2 



21ne 



e. 



n 



Now by ( [3:241 ), 



(3.53) 



(3.54) 



will guarantee p.53[ ). Consequently, p.39[ ) is proved by substituting p.53[ ) and e„ = e into 
p.50[ ) and applying the trivial bound P(_Bj „ 5) < 1, and p.40[ ) is yielded by the property 
of function shown in the proof of Theorem [T] 



Remark 10. Remarks similar to Remarks [2| and [3] can be drawn here too for Theorem [3] 



For maximal error probability, we have the following corollary, which can be proved similarly. 
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Corollary 2. Given a DIMC, for any channel code Cn of block length n with maximum error 
probability Pm{Cn) = 

,ln(n + l) In 



-2 In e„ 



(3.55) 



n n n 

for any t E Vn such that there are at least (n + l)"''^' portion of codewords in Cn with type t, 
where 6 is the largest number satisfying 



n 



(3.56) 



Moreover, iftEVn satisfies p.6[ ) and ( |3.15[ ), then the following hold: 
1) 



R{Cn)<I(t-P)-5- 
where 5 is the solution to 

1 + 



In e„ - In P{Bt^n,s 
n 



\n{n + 1) 



In 



n 



n 



-21ne, 



n 



- 1 en = {j^_{t;P,X,n)e 



-nr^ {t,5) 



(3.57) 



(3.58) 



with 6^(t, A) = 6. 
2) When en = e satisfying e + ^ (eV^ThTe + ^^f^^P) < I, 

R{Cn) < I{t-P)-—j=-Q [e+ — [eV^e+ ^3^^^.^^ 

+ (|;^|+0.5)^"" 



n n 



(3.59) 



I{t- P) - ^^^g-^ {e) + {\X\+ 0.5)^^^^^ + 0(n-^). (3.60) 



n 



n 



C. Taylor-Type Expansion 

Fix a DIMC P = {p{y\x),x E X ,y Ey} with its capacity Cdimc > 0. For any block length n 
and average error probability e, let -R„(e) be the best coding rate achievable with block length n 
and average error probability < e, as defined in ( |2.66[ ). In this subsection, we extend Theorem [2] 
to establish a Taylor-type expansion of -R„(e) in the case of DIMC. 

We begin with reviewing the non-asymptotic achievability of jar decoding established in [[T|. 
It has been proved in |[1| that under jar decoding, Shannon random codes C„ of block length n 



based on any type t eVn satisfying ( |3.6[ ) and ( |3.15[ ) have the following performance: 
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1) 



(0.5 + |A'|)ln(n + l) - In 



n 



R{Cn)>I{t;P)-6-r_{t,6)~ 
while maintaining 

\ V^crg, _(t;P, A) 

for any 6 G {0,A*_{t)), where A = ^^^^ satisfying A) = 6. 



2{l-CBE)Mo,-{t;P,\) 



-nr— {t,S) 



(3.61) 



(3.62) 



2) 



3) 



R{Cn)>I{t;P)-a^{t;P) 
while maintaining 

71 

Pe{Cn) < 

for any a > 0. 

R{Cn) > I{t;P)- 
while maintaining 



2alnn {0.5 + a + \X\)\n{n + 1) ^ /lnlnn\ 

(J I I (3.63) 



n 



n 



n 



2\/'Ka\nn 



(3.64) 



n 



2 ' 7 n n (T3(t; P) 



(3.66) 
for any real number c. 

By combining p.61[ ) and p.62[ ) with p.33| ) and p.34| ) or with p.35| ) and p.36| ), it is expected 
that -R„(e) would be expanded as 



P„(e)=/(t; P)-6 + o{6) 



(3.67) 



for some t E V, where 6 is defined according to p.62[ ), p.34[ ), or p.36[ ). In the rest of this 
subsection, we shall demonstrate with mathematic rigor that this is indeed the case. To simplify 
our argument, we impose the following condition^ on the channel: 
(CI) For any t E V, MBit; P) < 00. 



*Some of these conditions, for example, Condition C3, can be relaxed. Here we choose not to do so in order not to make 
our subsequent argument unnecessary complicated. 
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(C2) al{t; P) = implies I{t; P) = 0. 
(C3) For any t e V, Al(t; P) = +oo. 

(C4) There exists A* > such that 6-{t, A), al _{t; P, A), Mo^-it; P, A), MD,-(t; P A), and 

r_(t, A)) are continuous functions of t and A over (t, A) E V x [0, A*]. 
(C5) There exists s* > such that rZ^{t, s) is a continuous function of t and s over (t, s) G 
P X [0, s*], where rZ^(t, ■) is an inverse function of r_(t, •). 
Since r_(t, 5) is a continuous and strictly increasing function of 5 before it reaches +oo — which 
may or may not happen — it can be easily verified that for any s > 

rZ^it,s) = max{5 : r_(t, 5) < s} 

= inf{(5 : r_(t,(5) > s}. (3.68) 

In view of the definitions and properties of S^(t, A), cr|, _(t; P, A), Mu^^it; P, A), MD_(t; P, A), 
and r_(t,5) (see [j6|l for details and examples), Conditions (CI) to (C5) are generally met by 
most channels, particularly by channels with discrete output alphabets, and discrete input additive 
white Gaussian channels. 

To characterize 5 in p.67| ) analytically, we need a counterpart of Lemma [T] To this end, define 



for any < c < Cdimc 

V{c) ={teV : lit- P) > c} (3.69) 
P„(c) ={teVn:I{t;P)>c} (3.70) 



and for any type t G P satisfying cr|)(t; P) > 



riA^CT?, _(t:P,X) 



gt-M^) =e ^ g(V^Aaz),_(t;P,A))e-"^-(*'^) (3.71) 

where A = Note that V (c) is a closed set, and it follows from Condition (C2) that 

cr|,(t; P) > for any t G P(c). Interpret gt-p,n{5) as a function of A through 5 = 5-{t, A). Then 
we have the following lemma. 

Lemma 2. There exists A+ > sucti tfiat for any n > and t G P(c), gt-p^n{S-{t, X)) is a 
strictly decreasing function of X over X G [0, A+]. 

Proof: The proof is in parallel with that of Lemma [T| As such, we point out only places 

where differences occur. In the place of ( |2.80[ ), we now have 

dal (t;P,A) 

■"^^ = -Mz5,_(t;PA) . (3.72) 
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In parallel with ( |2.81| ) and ( |2.82| ), we now have for any t e V{c) 
dgt-p,n{.^-{t,X)) 



dX 

<• g-nr_(i,<5_(t,A)) 

^ g-nr_(i,5,(i,A)) 
^ -nr.{t,S-{t,X)) 



y/nCTD- (t; P, A) 



27r 

y/naD- {t; P, A) 



A 



do"!, _(t;P,A) 



27r 

y/naD- {t; P, A) 



2a|,,_(t;P,A) (l + nAV|,,_(t; P, A)) 
AMz5,-(t;P,A) 



2a|,^_(t;P,A)(l + nAV|,__(t;P,A)) 
AM^,_(t;P,A) 



- 1 



- 1 



1 (3.73) 



(3.74) 



2or|,_(t;P,A) 

Since P(c) is closed, it then follows from Condition (C4) that there is a A+ > such that for 

any A e [0, A+] and any t G V{c) 

'AM,,,_(t;P,A) 



and hence 



2a^_(t;P,A) 

dgt-p^niS-{t,\)) 
dX 

for any n > 0. This completes the proof of Lemma |2] 



- 1 < 



< 



Remark 11. In view of ( |3.73[ ), it is clear that when n is large, gt-p^n{^-{t,X)) is a strictly 
decreasing function of X over an interval even larger than [0, A+] for each and every t G V{c). 



Now let 



e+ =max{(7i;p,„(5„(t, A+/2)) : t G P(c)} 



which, in view of Condition (C4) and the fact that P(c) is closed, is well defined and also an 
exponential function of n. For any e+ < e < 1/2 and t G P(c), let 5t,n(e) be the unique solution 
to 

Further define 



(3.75) 



s(c) = max < s : < s < s*, r_"^(t, s) < 



Cdimc ~ c 



Vt G P 



(3.76) 



and let e„(c) be the unique solution e to 

-lne(l + 2^/^ 



s[c) 



n 



{3.11) 
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It is easy to see that in view of Condition (C5), s(c) > is well defined and once again e„(c) 
is also an exponential function of n. Let < 1 be the unique solution e to 



e 1 + 2 



-21ne 



n 



Note that 



max{J(t; P) : t E Vn} = Cdimc - O 



Let N{c) be the smallest integer iV > such that 

max{J(t; P):teVn}> Cdimc 



Cdimc ~ c 



for all n > N. Then we have the following Taylor-type expansion of -R„(e) 

Theorem 4. For any n > N{c) and any max{e^, en(c)} < e < e^, let 

t* = argmax [/(t; P) - 5t,„(e)] 



= arg max 

t&r„(c) 



mP)-'^^Q-\e) 



n 



Then 



where 



\Rn{e) - {I{t*-P) - 6t*,nm < o(5i*,„(e)) 

+ 1.5)ln(n + 1) + 



o(5i*,„(e)) = r_(r,5i*,„(e)) + 



n 



if e < \, and 



n 



n 



(3.78) 



(3.79) 



(3.80) 
(3.81) 

(3.82) 
(3.83) 



^ {\X\ + l)Hn + l) + d2 



otherwise, where di and d2 are constants depending on the channel, but independent of n and 
e. 

Proof: For any t E Vn and < e < 1, let 



6[(e) = snp{6>0:Pt,s>\l + 2 



-21ne 



n 



By Theorem |3] and the trivial bound P{Bt^n,5) < 1j it is not hard to verify that 

Ine + ln^^ In (n 



Rnie) < max[/(t; P) - 5Q - 



n 



+ 



-21ne 



+ |A'|ln(ri + 1) 



n 



(3.85) 
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Let us now examine 



max[/(t; P) - 5^ 



t,n}- 



In view of the Chernoff bound (see Theorem 8 in [[6|), 

Pt,5 < e-"'--(*'^) 



for any t E Vn and 5 > 0, which, together with ( |3.68[ ), implies 



\ 



n 



J 



< rZ\t,s{c)) 
Cdimc — c 



< 



(3.86) 

(3.87) 
(3.88) 



whenever max{e+, e„(c)} < e < e^. In the above derivation, p.86[ ) is due to ( |3.68[ ); and p.87| ) 
and dXSSl ) follow from ( [3J6l ), ( [3J7] ), and ( [3J8] ). Therefore, 



max[/(t; P) - 6fj > max /(t; P) 
> c 



Cdimc ~ c 



(3.89) 



where the last inequality is due to p.79[ ). In view of p.89| ), it is not hard to see that for any 
t e Vn achieving maxtep„[/(t; P) - 



and hence 



I{t-P)>c^bl^>c 



ma^x[I{t; P) - 5Q = niax [/(t; P) - 



which, together with (3.85), imphes 



In e + In =^ 
R4e) < niax [/(t; P) - 6Q ^ + 



ln(^l + 



-21ne 



\X\ln{n + l) 



n 



-. (3.90) 



When e > ^, it follows from ( |3.24[ ) and ( |3.54[ ) that for any t e Vn{c) 



> 



in 



2eV-21ne 



CBEMnjt; P) 



n 



n 



2eV-21ne 



CBEMr>{t-P)' 
aUt-P) , 



(3.91) 
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Since V(c) is closed, it follows Condition (C4) that aoit^P) and ^1f//.'p? are bounded over 
V{c). Plugging dMT] ) into ( [190l ) yields 



RnU) < niax 



/(t;P)-^^^g-(e) 



n 



+ 



{\X\ + l)ln{n + l) + d 



n 



for some constant d, which, together with the achievability in p.65[ ) and p.66| ), implies p.84[ ). 
Now let us focus on the case when e < |- For any t E V{c), let 5t „(e) be the unique solution 

to 



1 + 2 



n 



e = ^D-it; P, X,n)e~ 



(3.92) 



where A = q^'^'^ . By following the argument in the proof of Theorem [2| it is not hard to verify 



d 

n 



that for any t G Pn(c) 

C(^) > kni^) > kni^) - - (3-93) 

for some constant d independent of n, e, and t. Plugging ( |3.93[ ) into p.90[ ) then yields 

--^ + \X\\n{n + l) + d 



Rn{e) < I{t*; P) - (5t.,„(e) ^ + 



n 



n 



(3.94) 



In the meantime, 

e = gt*;P,n{^t'',n 
> 



where At.,„ = f^'^^ 



Ine 



. Consequently, 



(3.95) 



In 



n 



< r_(r,<5,,,„(e)) + 



2n n 



n 



(3.96) 



where r^i is a constant independent of n, e, and t*. Now substituting p.96| ) and e < | into p.94| ) 
yields 

Rn{e) < I{t*-P)-5t*,n{e) + r.{t\5t*,n{e)) 



+ 



In 2^ + r^i + Jr_(t*, 5i.,„(e)) + ^ + ^ + |lnn+|A'| ln(n + l) + d 



n 



< I{e- P) - 5i.,„(e) + r_(r , 5t*,n{e)) + 



^1+ (|A'| + |)ln(n + l) 



(3.97) 
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for some constant independent of n, e, and t*, where the last inequality is due to the fact that 
in view of Condition (4), r_(t*, „(e)) is bounded over t G P(c) and e > max{e+, e„(c)}. 

To complete the proof, let us go back to the achievability given in p.61[ ) and ( |3.62[ ). Now 
choose t to be t*, and fellow the argument in the proof of Theorem |2j Then it is not hard to 
show that 

(|A'| + l)ln(n + l) + (ii 



Rnie) > /(r;P) -5,.,„(e) -r_(r,5„(e)) - 



n 



(3.98) 



where di is a constant independent of n, e, and t*. Combining p.98| ) with p.97| ) completes the 
proof of Theorem |4j ■ 
Remarks similar to those immediately after Theorem [2] also apply here. In particular. Theo- 
rem |4] and the achievability of jar decoding given in p.61[ )and ( |3.62[ ) to ( |3.65[ ) and ( |3.66[ ) once 
again imply that jar decoding is indeed optimal up to the second order coding performance in 
the non-asymptotical regime for any DIMC. In addition, the following remarks are helpful to 



the computation of the Taylor-type expansion of -Rn(e) as expressed in ( |3.80[ ) to ( |3.84[ ). 



Remark 12. When I{t; P), 5^{t, A), al _{t; P, A), MD,-{t; P, A), MD,-it; P, A), andr^{t, S-{t, A)) 
are all continuously differentiable with respect to t over t G V{c) and A G [0, A*], which is true 
for most channels including particularly channels with discrete output alphabets, and discrete 
input additive white Gaussian channels, Vn{c) in the definitions oft* and t* can be replaced 
by V{c). Thus, in this case, 

A 



t" 



argmax [I{t; P) - 5t,„(e)] 



arg max 



'l{t;P)-^^^Q-Ue^ 



n 



(3.99) 



(3.100) 



Hereafter, we shall assume that the channel satisfies this continuously differentiable condition, 
and use p.99| ) and p.80[ ), or p.lOO[ ) and p.81| ) interchangeably. 



Remark 13. It is worth pointing out the impact of c on the maximization problems given in 



( |3.99[ ), p.80| ), ( |3.100[ ), and ( |3.81[ ). In view of the definitions of s(c) and e„(c) in ( |3.76[ ) and 



it is not hard to see that when e is relatively large with respect to n (in the sense that 
is small), one can select c to be close to Cdimc- Ms case, it suffices to search a small 
range V{c) for optimal t*. On the other hand, when e is relatively small with respect to n, e.g.. 
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a exponential function of n, c should be selected to be far below Cdimc ^'"^ hence one has to 
search a large range V{c) for optimal t*. 



Remark 14. When the Taylor-type expansion of Rn{e) in Theorem^ is applied to the case of 
BIMSC, it yields essentially the same result as in Theorem |2] with explanation as follows. For 
any BIMSC, t{0) fully charaterizes the type t. Then by symmetry, ^q^^q^^ = at t{0) = 0.5 for 
any n and e. Note that 5t,n(e) = ^ni^) when t{0) = 0.5, the capacity achieving input distribution. 
Therefore, 

max [I{t;P)-6t,n{e)] = ^ max [/(t; P) - 5,,„(e)] 

= CBiMsc-Sn{e) + 0{6l{e)). (3.101) 

Consequently, by observing that the high order term o(5„(e)) in Theorem^is also in the order 
of S'^{e), the Taylor-type expansion of Rn{e) for BIMSC in Theorem^is shown to be the same 
as that in Theorem |2] 

D. Comparison with Asymptotic Analysis and Implication 

It is instructive to compare Theorem |4] with the second order asymptotic performance analysis 
as n goes to oo. 

Asymptotic analysis with constant < e < 1 and n ^ oo: Fix < e < 1. It was shown in 
l|3|' 0' ^^^^ ^'^^ ^ DIMC with a discrete output alphabet and Cdimc > 0, 

Rn{e) = Cdimc - ^^Q-\e) + O (—] (3.102) 
for sufficiently large n, where 

mm{aDit; P) : t e Vkl{t- P) = Cdimc} if e < ^ 
maxiaoit; P) : t e VLI{t; P) = Cdimc} if e > 

Once again, the expression Cdimc ~ ^^/P'Q^^i^) referred to as the normal approximation 
for -Rri(e) in [5]. It is not hard to verify that for sufficiently large n. 



MP) 



Cdimc - ^(e) < max 



n 



max 

t:3px,\t-Px\=0(^^ 



I{t-P)-''-^Q-\e) 

Th 



Cdimc - ^^Q-\e) + O (-] (3.103) 
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where the first equality is due to the fact that for any px satisfying I{px', P) = C'dimc and t 
satisfying \t — px\ = uj{l/n^^'^), 

I{t; P) - ''-^Q-\e) < C^,uc - "-^^^^Q-^e) 

as 

^ '^'^ ^aoit- P) - aoipx; P)\=0 f^Z^^ = - px\') = o(Cdimc - I{t; P)). 



'n \ A/n 

Therefore, when e > 1/3, p.l02[ ) and p.84[ ) are essentially the same for sufficiently large n. 



Let us now look at the case e < 1/3. Again, < e < 1/3 is fixed. In parallel with ( |2.109| ) 



and ( |2.110[ ), we have for each t G V{c) 
and 



1 -Mnit-.P) ^, , ^^^4^ 

2al{t-p) Qa%it;P) 



^-Q-\e) + O ( -] . (3.105) 
Combining p.l05D with p.l03D yields 

CDiMC-^4^Q~'(e) + 0(l/n) < max [/(t;P)-(5t;„(e)] 

< Cumc - ^^Q-\e) + O ( -] . (3.106) 



Thus the Taylor-type expansion of -R„(e) in Theorem |4] implies the second order asymptotic 
analysis with constant < e < 1 and n — )■ oo shown in p.l02[ ). 

Asymptotic analysis with n oo and non-exponentially decaying e. Suppose now e is a 
function of n and goes to as n — )■ oo, but at a non-exponential speed. Using arguments similar 



to those made above and in Subsection II-D, one can show that the Taylor-type expansion of 
i?„(e) in Theorem |4j implies that in this case, Cdimc and — ^^jS^Q~^{t) are still respectively the 
first order and second order terms of of -R„(e) in the asymptotic analysis with n — )• oo. Once 
again, to the best of our knowledge, the second order asymptotic analysis with n — )■ oo and 
non-exponentially decaying e has not been addressed before in the literature. 

Divergence from the normal approximation: In the non- asymptotic regime where n is finite 
and e is generally relatively small with respect to n, the first two terms 

max [I{t;P)-5t,n{t)\ 
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in the Taylor-type expansion of -R„(e) in Theorem |4] differ from the normal approximation in a 



strong way. In particular, the optimal distribution t* defined in p.99[ ) is not necessarily a capacity 
achieving distribution. In this case, the normal approximation would fail to provide a reasonable 
estimate for -R„(e). 

Example: Consider the Z channel shown in Figure [T] In this example, we show that the 




Fig. 1. Z Channel 



optimal distribution t* defined in ( |3.99[ ) is not a capacity achieving distribution. In the numerical 
calculation shown in Figure |2} the transition probability p (i.e. Fi{Y = 1\X = 0}) ranges 
from 0.05 to 0.95 with block length n = 1000 and error probability e = 10^^. As can be 
seen from Figure |2ja), t*(0) is always different from the capacity achieving t{0). Moreover, 
Figure |2|b) shows the percentage of I{t; P) — (5f „(e) over I{t*; P) — 5t*,„(e) when t is capacity 
achieving, t*, and uniform respectively. It is clear that Cdimc ~Kx,ni'^) apart from I{t*; P) — 
5i*,n(e) further and further when p gets larger and larger, where px is the capacity achieving 
distribution, indicating that under the practical block length and error probability requirement. 
Shannon random coding based on the capacity achieving distribution is not optimal. It is also 
interesting to note that for uniform t, I{t] P) — 5t,ri(e) is quite close to /(t*; P) — 5j*,„(e) within 
the whole range, implying that linear block coding is quit suitable for the Z channel even under 
the practical block length and error probability requirement. 

Implication on code design: An important implication arising from the Taylor-type expansion 
of -R„(e) in Theorem |4] in the non- asymptotic regime is that for values of n and e with practical 
interest, the optimal marginal codeword symbol distribution is not necessarily a capacity achiev- 
ing distribution. This is illustrated above for the Z channel. Indeed, other than for symmetric 
channels like BIMSC, it would expect that the optimal distribution t* defined in p.99[ ) is in 
general not a capacity achieving distribution for values of n and e for which 5t*,„(e) is not 
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i(0) vs. p 



Comparison of l(t;P) S^ Je) for different input distributions 



- Capacity Achieving 

- - optimal (* 

- - uniform (linear) 



0.2 



0.4 



(a) t{0) vs. p 




Cdimc i5p,-.„(e) 

l{i;P) foi" uniform I (linear) 



(b) for different t 



Fig. 2. Illustration for the Z channel with n = 1000 and e = 10 ^: (a) comparison of t* with the capacity achieving distribution; 
and (b) comparison of I{t\ P) — (5t.n(e) among different distributions t. 



relatively small. As such, to design efficient channel codes under the practical block length and 
error probability requirement, one approach is to solve the maximization problem in p.99| ), get 
t*, and then design codes so that the marginal codeword symbol distribution is approximately 



IV. Approximation and Evaluation 

Based on our converse theorems and Taylor-type expansion of i?n(e), in this section, we 
first derive two approximation formulas for -R„(e). We then compare them numerically with the 
normal approximation and some tight (achievable and converse) non- asymptotic bounds, for the 
BSC, BEC, BIAGC, and Z Channel. In all Figures |3] to 11, rates are expressed in bits. 



A. Approximation Formulas 

In view of the Taylor-type expansion of -R„(e) in Theorem |4| one reasonable approximation 
formula is to use the first two terms in Taylor-type expansion of -R„(e) as an estimate for -R„(e). 
We refer to this formula as the second order (SO) formula: 

i?f(e) = max[/(t;P)-5,,„(6)] 

tev{c) 

= I{t*-P)-5t*,p{e) (4.1) 
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where c is selected according to Remark [13 



To derive the other approximation formula for -Rri(e), let us put Theorem [3| Theorem |4| and 
the achievability given in p.61[ ) and p.62[ ) together. It would make sense for an optimal code of 
block length n to draw all its codewords from the same type t with \t — t*\ = 0{l/n). In this case, 



it is not hard to see that the term \X 



ln(n+l) 



in the bounds of Theorems 



(3.83 1, and (3.84)) can be dropped. By ignoring the higher order term 



and kj (i.e. ( [333] ), ( [3351 ), 



in (3.33) and (3.35), we get the following approximation formula (dubbed "NEP") 



lit*; P) - 5t.;p(e) - — + - \nPiBt*,nA^.,^e)) 



Rewrite the normal approximation as 



oNormal / ,\ 



DIMC 



n 



(4.2) 



(4.3) 



B. BIMSC 

In the case of BIMSC, it follows from Theorem |2| and Remark [T4| that i?^°(e), R^'^^ie), and 
^Normal j^g-j bccomc rcspcctively 



R: 



NEP/ 



Cbimsc ~ 
Ine 



BIMSC 



and 



R. 



Normal / 



BIMSC 



+ -lnP(5„,,„(,)) 
n n 



n 



(4.4) 



(4.5) 



From Theorem [2] and its comparison with asymptotic analysis, we can expect that when 5n(e) is 
extremely small, R^{e) and -R^°™^'(e) are close, and both can provide a good approximation 
for Rn{t). However, as 5n{t) increases, the relative position of R^{t) and i?^°™^'(e) depends 
on 

Mh{X\Y) 



Cxi 



Y 



6<(x|r)- 

Specifically, given a channel with large magnitude of Cx\y, -R^°™^'(e) is not reliable, as it can be 
much below achievable bounds or above converse bounds. On the other hand, as shown later on, 
R^{e) is much more reliable. Moreover, /2^^^(e), which has some terms beyond second order 
on top of R^{t), always provides a good approximation for Rn{t) even if (5„(e) is relatively 
large. 
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1) BSC: For this channel, the trivial bound P{Bn,s„{€)) < 1 is applied in the evaluation of 
R^^^(e),. Before jumping into the comparison of those approximations, let us first get some 
insight by investigating Cx\y- It can be easily verified that for BSC with cross-over probability 

P, 

1 1 — 2p 

^^1^ = "61n^i^p3(i_p)3- (4.6) 

As can be seen, (^x\y is always negative for any p E (0, 1) and Cx\y —oo as p — t- 0. Therefore, 
in the case of a very small p, i?^°"°^^(e) will be larger than -R^*^(e) by a relatively large margin, 
and even larger than the converse bound. 

Now in order to compare those approximations, we invoke Theorem 33 (dubbed "RCU") and 
Theorem 35 (dubbed "Converse") in |(5|, which serve as an achievable bound and a converse 
bound, respectively. In addition, another converse bound is provided by the exact calculation 
of ( |230l ) and ( [CT] ) in Corollary [I] (dubbed "Exact"). Moreover, by Theorem 52 in Q, i|f is 
the third order in the asymptotic analysis of -R„(e) as n — )■ oo for BSC, and therefore, another 
approximation is yielded by adding ^ to the normal approximation (dubbed "Normal_ln"). 
Then these four approximation formulas (NEP, Normal ln, Normal, SO), two converse bounds 
(Converse, Exact), and one achievable bound (RCU) are compared against each other with block 
length n ranging from 200 to 2000; their respective performance is shown in Figures [3] and |4j 

In Figure [3| the target channel is the BSC with cross-over probability 0.11, where (x\y is 
relatively small. In Figure [3]^a), bounds are compared with fixed maximum error probability 
Pm = 10^^, while 6n{e) changes with respect to block length n, shown in Figure ^h). In 
the meantime. Figure [sj^c) shows comparison of these bounds when 5„(e) is fixed to be 0.06, 
while Pm = 5'x|y,n(0.06) is shown in Figure [3|^d). As can be seen, when 5„(e) gets smaller, the 
SO and Normal curves tend to coincide with each other. Moreover, since the SO and Normal 
approximation formulas are quite close in this case, both the NEP and Normal_ln provide quite 
accurate approximations for i?„ (e) with the NEP slightly better. 

Figure |4] shows the same curves as those in Figure |3} but for the BSC with cross-over 
probability 0.001. In this case, the magnitude of (x\y is large, and therefore, the SO and Normal 
curves are well apart. In fact, the Normal curve is even above those two converse bounds, and 
so does the Normal ln curve, thus confirming our analysis based on (x\y made at the beginning 
of this discussion for BSC. On the other hand, the SO curve stays at the same relative position 
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BSC with p=0.11 and =10"' 



SJe) VS. block length n 




200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 




(a) Bounds with = 10 



(b) 6„ie) with = 10- 



BSC with p = Q.ll and =(jj(^yj0.06) 




logipJ',,. =l°gi»9AH-.,.W vs. block length n 



"ioO 400 600 800 1000 1200 1400 1600 1800 2000 
block length 




■"■^OO 400 600 800 1000 1200 1400 1600 1800 2000 
block length 



(c) Bounds with Pm ~ gx\Y,ti{S) and 5 = 0.06 (d) logj^Q Pm with Pm = gx\Y,n{S) and 5 = 0.06 

Fig. 3. Comparison of different bounds for BSC with p — 0.11. 



to achievable and converse bounds, and the NEP still provides an accurate approximation for 

2) BEC: This special channel serves as another interesting example to illustrate the difference 
between the SO and Normal approximations. On one hand, it can be easily verified that 

P{Bn,s) = Pr (-- lnp(X"|F") > H{X\Y) + 4 ~ 9x\yAS) (4.7) 
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BSC with n=0.001 and P , =10 




vs. block length n 



200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 




200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 



(a) Bounds with = 10 



(b) 6„ie) with = 10- 



1.02 
1.01 
1.00 
0.99 
0.98 
0.97 
0.96 
0.95 
0.94 



0.9, 



BSC with p =0.001 and f,„ =gjtii-,.(°-M) 





NEP 






- - Normal_ln 






- - RCU 






X — X Exact 






1 — 1 Converse 






Ea □ SO 






A A Normal 













logipJ',,. =^ofSioSm-„W vs. block length 



00 400 600 800 1000 1200 1400 1600 1800 2000 
block length 



(c) Bounds with Pm ~ gx\Y,n(S) and S = 0.04 
Fig. 4. Comparison of different bounds for BSC with p — 0.001. 




200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 



(d) logio Pm with Pm = gx|y,n(<5) and 5 = 0.04 



and therefore, — ^ and P(Bn,s„{e)) are cancelled out in R^^^(e), which is then identical 
to -R^°(e). On the other hand, 



2p) 



<0 ifp<0.5 
= ifj9 = 0.5 
>0 ifp>0.5 



(4.8) 



Therefore, the Normal curve can be all over the map, i.e. it can be above some converse when 
p < 0.5, and below an achievable bound when p > 0.5. When p = 0.5, the Normal curve happens 
to be close to the SO curve, hereby explaining why it provides an accurate approximation for 
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i?„(e) in this particular case, as shown in [[5|. 

To provide benchmarks for the comparison of approximation formulas, Theorem 37 and 38 
in [[5| are used here, dubbed "DT" and "Converse" respectively. The exact calculation of ( |2.60| ) 
and ( |2.61| ) in Corollary [T] (dubbed "Exact") again serves as an additional converse bound. Then 
those bounds are drawn in Figures [5] and [6] in the same way as those in figure |3} where erasure 
probabilities are selected to be 0.05 and 0.9, respectively. Once again, numeric results confirm 
our analysis and discussion above. 



BEC with p=0.05 and P,„ =10 




S,Xe) VS. block length n 



200 400 600 



1000 1200 1400 1600 1800 2000 
block length 



(a) Bounds with Pm — 10 




200 400 600 



1000 1200 1400 1600 ISOO 2000 
block length 



(b) with P„ = 10" 



BEC with p=0.05 and J=;„ =g„yjl.99c 02) 




l°gio-P,.. =l°giogxii-.„W vs. block length 7^ 



g -4 



200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 



(c) Bounds with Pm = gx\Y,n{S) and 5 = 0.0199 
Fig. 5. Comparison of different bounds for BEC with p = 0.05. 




200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 



(d) logio Pm with Pm = QxiyAS) and S = 0.0199 



3) BIAGC: Here we assume that codewords are modulated to {+1,-1} before going through 
an AWGN channel, and apply the trivial bound P{Bn,5„{e)) < 1 in the NEP formula. Similarly 
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BEC with n = 0.9 and P., =10"' 




SJe) VS. block length n 



200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 




200 400 600 800 1000 1200 1400 1600 1800 2000 
block length 



(a) Bounds with = 10 



(b) 6„ie) with = 10- 



BEC with p = 0.9 and f,„ =9y|y„(2.20e 02) 




logipJ',,. =^ofSioSm-„W vs. block length 



°°^5oO 400 600 800 1000 1200 1400 1600 1800 2000 
block length 

(c) Bounds with P,„ = gx\Y,n{S) and 5 = 0.022 
Fig. 6. Comparison of different bounds for BEC with p = 0.9. 




loo 400 600 800 1000 1200 1400 1600 1800 2000 
block length 

(d) logio Pm with Pm = gx\Y,n{S) and 5 = 0.022 



to BSC and BEC, we would like to get some insight by investigating C,x\y- Since in this case, 
Cx\Y does not seem to have a simple close form expression which can be easily computed, 
numerical calculation of (^x\y is shown in Figure |7| where SNR ranges from 8dB to lO.SdB. As 
can be seen, BIAGC is similar to BSC, i.e. (x\y is always negative and its magnitude increases 
with SNR. Therefore, -Rj^°™^'(e) is close to -R,^°(e) when SNR is low, but can be above some 
converse bounds when SNR is high. This is confirmed in Figures [8] and |9[ where exact evaluation 
of ( |2.62[ ) and ( |2.63[ ) in Corollary [T] (dubbed "Exact") serves as a converse bound. 
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Cxli-vs- SNR 




Fig. 7. Cx\Y of BIAGC 



C. DIMC: Z Channel 

To show an example of DIMC which is not a BIMSC, we consider again the Z channel shown 
in Figure [T| The capacity of Z channel is well known and given by 

Cz = ln(l + (l-p)pT^) (4.9) 

with the capacity-achieving distribution 



Pxix) = < 

and the corresponding output distribution 

Priy) = 



^ p for X = 

i-p+p i~p 



P forx = l 



(4.10) 



I, i-p+p i-p 



for y = 



(4.11) 



p i-p 



P3 for y = 1 . 



\ i-p+p i-p 

To calculate R^^^{e), P{Bt^n,5) needs to be further investigated, where an interesting obser- 
vation is that given with type t, ^ In ^^^J^^-* > — oo if and only if ?/j = 1 when Xj = 1, and 
the value of ^ In ^^^yn-^ only depends on the number of Ui being 1 for i G |j : Xj = 0|. One 
can then verify that 



Bt,n, = {y''■.-\{^■.m = 0}| < g,(0) - ^ 



(4.12) 
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BIAGC with SNR= 3.52CIB and P,„ =10-' 
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BIAGC with SNR= 3.52clB and F„. =g;t„„(2.65e 02) 
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Fig. 8. Comparison of different bounds for BIAGC with SNR = -3.52 dB. 



When qt{0) ^ 0.5, 

[ Pr |-ilng,(y,") < HjYt) - ^^ \n'-^] if g,(0) < 0.5 
P(5,n,.) ={ \ (4.13) 
1^ Pr |-1 lng,(y;") > H{Y,) - ^^^ r^r^yW 1^ ^ j if ^^(O) > 0.5 

where Ft is a random variable with distribution qt. Consequently, we can apply the left NEP 
||6|, chernoff bound, right NEP [|6j| with respect to entropy to upper bound P{Bt^n,s) when 
gt(0) <, =, > 0.5, respectively. 

To provide benchmarks for the comparison of approximation formulas, exact evaluation of 



(|3.55[) (with '"("+^) dropped and t = t*) and p.56[) is provided, which, dubbed "Exact", 
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BIAGC with SNR=9.63clB and P =10-' 



vs. block length n 
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(c) Bounds with Pm ~ gx\Y.n{S) and 5 — 0.0175 



(d) logio Pm. with Pm = gxiY,n{S) and S = 0.0175 



Fig. 9. Comparison of different bounds for BIAGC with SNR = 9.63 dB. 



serves as a converse bound, and Theorem 22 in [5] provides an achievable bound, dubbed "DT" 
and given below: 



Pm<Yl 

i=0 



m 
i 



[1 - p)"^ Vmin < 



1,(M-1 




(4.14) 



where M = 2"^ and m = t*{0)n. Figures 10 and 11 again show that the Normal curve is all 
over the map while the NEP curve always lies in between the DT achievable curve and the Exact 
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converse curve. It is also worth pointing out that if the capacity achieving distribution t = px 
instead of t* was chosen in the calculation of the Exact and DT bounds, then both of them would 
be lower, confirming our early discussion that in the practical, non- asymptotic regime, the optimal 
marginal codeword symbol distribution is not necessarily a capacity achieving distribution. 



Z Channel with p = 0.001 and P,„ =10-" 
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Fig. 10. Comparison of different bounds for Z Channel with p = 0.001. 



V. Conclusion 

In this paper, we have developed a new converse proof technique dubbed the outer mirror image 
of jar and used it to establish new non-asymptotic converses for any discrete input memoryless 
channel with discrete or continuous output. Combining these non- asymptotic converses with the 
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Z Channel with p=0.9 and =10"° 
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Fig. 11. Comparison of different bounds for Z Channel with p = 0.9. 



non- asymptotic achievability proved in [[T| and [|2) under jar decoding and with the NEP technique 
developed recently in [[6|, we have characterized the best coding rate -Rn(e) achievable with 
finite block length n and error probability e through introducing a quantity 5t,n(e) to measure the 
relative magnitude of the error probability e and block length n with respect to a given channel P 
and an input distribution t. We have showed that in the non- asymptotic regime where both n and 
e are finite, -R„(e) has a Taylor-type expansion with respect to 5t,„(e), where the first two terms of 
the expansion are raaxt[I{t; P) — 5f,„(e)], which is equal to /(t*, P) — 5t*,n(e) for some optimal 
distribution t*, and the third order term of the expansion is 0(6^, „(e)) whenever (5j._„(e) = 
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^l{^y\nn/n). Based on the new non- asymptotic converses and the Taylor-type expansion of 
i?„(e), we have also derived two approximation formulas (dubbed "SO" and "NEP") for -Rn(e)- 
These formulas have been further evaluated and compared against some of the best bounds 
known so far, as well as the normal approximation revisited recently in the literature. It turns 
out that while the normal approximation is all over the map, i.e. sometime below achievability 
and sometime above converse, the SO approximation is much more reliable and stays at the same 
relative position to achievable and converse bounds; in the meantime, the NEP approximation 
is the best among the three and always provides an accurate estimation for -R„(e). 

It is expected that in the non-asymptotic regime where both n and e are finite, the Taylor- 
type expansion of -Rn(e) and the NEP approximation formula would play a role similar to 
that of Shannon capacity [[8| in the asymptotic regime as n — )• oo. For values of n and e 
with practical interest for which 5t*,„(e) is not relatively small, the optimal distribution t* 
achieving maxt[I {t; P) — 5t,„(e)] is in general not a capacity achieving distribution except 
for symmetric channels such as binary input memory less symmetric channels. As a result, an 
important implication arising from the Taylor-type expansion of -Rri(e) is that in the practical 
non- asymptotic regime, the optimal marginal codeword symbol distribution is not necessarily a 
capacity achieving distribution. Therefore, it will be interesting to examine all practical channel 
codes proposed so far against the Taylor-type expansion of -R„(e) and the NEP approximation 
formula and to see how far their performance is away from that predicted by the Taylor-type 
expansion of -R„(e) and the NEP approximation formula. If the performance gap is significant, 
one way to design a better channel code with practical block length and error probability 
requirement is to solve the maximization problem maxt[/(t; P) — 5t.„(e)], get t*, and then design 
a code so that its marginal codeword symbol distribution is approximately t*. 

Finally, we conclude this paper by saying a few words on non-asymptotic information theory. 
From the viewpoint of stochastic processes, most classic results in information theory are based, 
to a large extent, on the strong and weak laws of large numbers and on large deviation theory. For 
example, most first order asymptotic coding rate results in information theory were established 
through the applications of asymptotic equipartition properties and typical sequences [|9l|, which 
in turn depend on the strong and weak laws of large numbers. On other hand, error exponent 
analysis in both source and channel coding is in the spirit of large deviation theory. The recent 
second order asymptotic coding rate results [[3j, Q, [[5J depend heavily on the Berry -Esseen 
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central limit theorem. In the non- asymptotic regime of practical interest, however, none of these 
probabilistic tools can be applied directly. To fill in this void space, we have developed the NEP 
in ||6|. Based on the NEP, we have further invented jar decoding in [1] and presented the outer 
mirror image of jar converse proof technique in this paper. As demonstrated in this paper along 
with |[T| and ||6|, the NEP, jar decoding, and the outer mirror image of jar together form a set of 
essential techniques needed for non-asymptotic information theory. They can also be extended 
and applied to help develop non- asymptotic multi-user information theory as well. 
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